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Abstract 

Analysis of multivariate data sets from e.g. microarray studies frequently results 
in lists of genes which are associated with some response of interest. The biological 
interpretation is often complicated by the statistical instability of the obtained gene 
lists with respect to sampling variations, which may partly be due to the functional 
redundancy among genes, implying that multiple genes can play exchangeable roles in 
the cell. In this paper we use the concept of exchangeability of random variables to 
model this functional redundancy and thereby account for the instability attributable to 
sampling variations. We present a flexible framework to incorporate the exchangeability 
into the representation of lists. The proposed framework supports straightforward 
robust comparison between any two lists. It can also be used to generate new, more 
stable gene rankings incorporating more information from the experimental data. Using 
a microarray data set from lung cancer patients we show that the proposed method 
provides more robust gene rankings than existing methods with respect to sampling 
variations, without compromising the biological significance. 



1 Introduction 

Since the advent of the microarray technology, high-throughput experiments gen- 
erating vast amounts of data have been ubiquitous in genetics, for studying e.g. 
genome-wide patterns of gene expression and copy number alterations. The out- 
put of univariate analysis of such high-throughput experiments is often a gene list, 
consisting of genes related to some response of interest (e.g. the discrimination be- 
tween groups or a quantitative trait). The gene list can be ordered or unordered (i.e. 
ranking the genes by their association to the response or just listing all genes whose 
association exceeds some threshold) and it can consist of all studied genes or only 
a subset. The challenge is then to interpret the obtained list in a biological context 
to understand the underlying processes and generate biologically valid hypotheses. 
An inherent problem compromising the interpretability of the observed gene lists 
is that they are often highly unstable, both with regards to small changes in the 



underlying data set and with regards to changes in the ranking method (Fortunel 



eram2003| llrizarry et al| |2005j plchiels et a/.H2005| [EnTPor et a/.[ |2006j |Fan et oL 



2006 Boulesteix and Slawski 2009 Abraham et al, 2010). This could be due to 



redundancy in the cell machinery, i.e. the existence of many genes having similar 
functions in the cell and thereby being exchangeable in a given experimental list. In 
this case, the observed gene list depends on the selection of samples in the data set. 
This means that the functional overlap between two lists may be substantial even 
though the actual gene overlap is very small. Other possible causes of the apparent 
instability are noisy measurements and the generally small sample sizes in this type 



of experiments (Ein-Dor et al. . 2006 He and Yu 2010) 



In this paper we propose a method for stabilization of observed gene rankings, 
using information extracted from the experimental data. We employ the concept of 
exchangeability of random variables to quantify the functional redundancy among 
the genes and we propose a general framework for incorporating exchangeability into 
the representation of gene lists. In this framework, each list is represented as a vector 
in M.^ where M is the number of genes in some universal set, typically the genes 
measured by a microarray chip. Each entry of the list vector quantifies the contri- 
bution to the list from one of the genes. This representation allows straightforward 
comparison of any two gene lists by means of any of the conventional measures of 
similarity or dissimilarity defined on X M*^. This is in contrast to previously 
proposed methods for list comparison, which are tailored to compare specific types 
of lists. We show that using the proposed method, we obtain gene rankings that 
are more robust than the original lists against sampling variations in the underlying 
data, without compromising the relevance to the response. 



2 Related work 



The stabilization of gene rankings has attracted considerable interest during the last 
decade. Some authors have addressed the ranking method directly and proposed 
methods providing more robust and accurate ranking results and differential expres- 
sion detection for the "large p, small N" situation which is standard in biomedical 
applications 



le.g. 



Tusher et al. (2001); Perelman et al. (2007)). 



Another way to obtain more stable rankings is to combine the information from 



several different rankings (e.g. 


Rhodes et al. 


( 


2002 


2004 


); 


Breitling et al. 


( 


2004 


); 


DeConde et al. 


(2006 


); 


Hong and Breitling 


(200^ 


); 


Abeel et al. 


(2010D). An overview 


of the most well-known aggregation methods is given by |Boulesteix and Slawski 



(2009). The most straightforward method is to compute some univariate statistic 
for each gene from the set of rankings and re-order the genes by their value of this 
statistic. The statistic can be e.g. the mean of the positions for the gene (e.g. Jurman 



et al., 2008), a rank product of the positions (Breitling et al.. 2004) or the fraction 



of the rankings where the gene is among the top-/c genes for some k (e.g. Pepe et al. 



2003 Jurman et al.. 2008). There are also more complex aggregation methods for 



extracting an optimal top-fc list based on e.g. Markov chains (DeConde et al.. 2006). 



Comparison of gene lists is an essential part of many algorithms, e.g. for en- 



richment analysis of gene sets (e.g. Draghici et al., 2003 Subramanian et al., 2005 



Ein-Dor et al. . 2006 Ackermann and Strimmer , 2009 ) and assessment of the stability 



of gene rankings (e.g. Jurman et al. . 2008, 2010 Abraham et al. . 2010). In general 



each of the existing methods can only be used to compare specific types of lists, e.g. 
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two ordered lists with the same number of genes, or one ordered hst and one short 
unordered hst. Appendix I provides a brief overview of the most weU-known hst 
comparison methods and show that many of them can be cast in the here proposed 
framework if its components are chosen suitably. 



3 Methods 

3.1 Exchangeability of random variables 

Consider a probability triple {Q,J^,P) and let Xi, . . . ,Xm denote random variables 
on Q, taking values in some space M. Given Xi, . . . ,Xm we define the multivariate 
random variable 

XiX...xX^:fi^Mx...xM 

^ ^ ' 

m 

by Xi X ... X Xm(w) = (Xi(ci;), . . . , Xm(a;)). To each random variable Xi x . . . x X^ 
there is an associated measure -Prxix...xx„ defined by 

Prx,x...xX„(A) =P{{uen; Xix ...X Xm{co) E A}) 

for all measurable subsets A C M x . . . x M. 

Conventionally, a finite sequence (Xi, . . . ,Xm) of random variables is called ex- 
changeable if their joint distribution is invariant under permutation of Xi, . . . ,Xm, 
i.e. if 



Prxix...xXm - P^x^,u 



T(l)X---X^7r(m) 

for each vr G Sm (the group of permutations of {!,..., m}). This means that from a 
statistical point of view the order of the variables in the product is completely irrele- 
vant. From this definition, it is clear that any sequence of independent and identically 
distributed (i.i.d.) random variables is exchangeable, but the reverse implication is 



Kingman 


( 


1978 


); 


Aldous 


( 


1985 



The definition of exchangeability given above is rather strong, and we introduce a 
much weaker notion of exchangeability as follows: 

Definition 1. The finite sequence of random variables (Xi, . . . ,Xm) is weakly ex- 
changeable if the null sets of Prxix...xXm are invariant under permutations, i.e. 

-P'"X^(i)X...xX^(„) << -P'^X^(i)X...xX^(„) 

for all n,T E Sm- Here fi « v denotes that the positive measure /i is absolutely 
continuous with respect to the positive measure v. 

It is clear that a finite sequence of random variables (Xi, . . . , X^) that is exchange- 
able is weakly exchangeable, but that the opposite implication is false in general. 

3.2 Measures of exchangeability 

In this section we will discuss some ways to quantify the degree of exchangeability 
for a sequence of random variables. 
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Definition 2. Given a finite sequence of random variables {Xi, . . . ,Xm), tlie total 
exchangeability variation is given by 



pVar 

^XlX...XXm ■~ I Q 



, - 1 ^ 



-P'"X^(i)X...xX„ 



1 >^ 



X...XX. 



T{m) 



Te5m 



Here, denotes tlie total variation of the (real- valued) measure /x. 



We note that P^fx...xx. 



: iff the sequence (Xi, . . . , X^) is exchangeable. 
We now turn to a discrete probability space (fi, J^, P), where is a finite set, F = 
2^ is the cr-algebra consisting of all events and P : J-" — )■ [0, 1] is a probability measure. 



We let Xi, . . . , Xm be random variables on f2 taking values in 
support of the random variable Xi x . . . x X^ is defined by 



suppXiX. . .xXm ■■= {{qi, . . . G M X . . . X M; Pr 



XlX...XX,; 



{!,..., M}. The 



{{{qi,---,qm)})>0} 



For finite sequences of discrete random variables, Definition[T]implies that (Xi, . . . , X„ 
is weakly exchangeable iff 



SUpp (Xi X . . . X Xm) = SUpp (X^(i) 



X . . . X 



-^7r{m)) 



for all TT G 5*^. Therefore, to quantify the degree of weak exchangeability for a 
sequence of discrete random variables we will compare the support of the joint dis- 
tributions. 

Let p : (M X . . . X M) X (M X . . . X M) M be a metric and define the distance 
between two sets A, P C M x . . . x M by 

distp(y4, P) := min p(a,b). 

aeA,beB 

Furthermore, define the Hausdorff distance between the two sets by 



HDp{A, B) := max I sup distp({a}, P), sup distp({6}, A) J . 



Definition 3. Given a finite sequence of discrete random variables (Xi, . . . , Xm), the 
maximal exchangeability distance is given by 



^^XiX...xX, 



E^e5„, T^reSm (suppX^(i) x . . . x X^(„), suppX^(i) x . . . x X^(„)) 



p{lm,Mm) \Sm\{\Sm\ " 1; 

and the mean exchangeability distance is given by 



XrX...xX„ 



^ E,re5^ Zre5^JE-y.(i)X...xX^(^) [distp (X^(i) X . . . X X^(n,), SUppX^(i) X ... X X^(„,))] 

Mm) \Sm\{\Sm\ " 1) 

Here, 1„ = (1, . . . , 1) and M„ = (M, . . . , M). 
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Clearly, ED^^^^ ,,^^ = ED™-^,...,^^^^^ for any vr G S^, and ED^^^^^,^^ = 

iff (Xi, . . . , Xm) is weakly exchangeable, and the same is true for ED^'^'^ xXm- 

In the rest of this paper we will mainly consider exchangeability of pairs of discrete 
random variables with values in M = {1, . . . , M}. In this special case, since Xi x X2 
is the reflection of X2 x Xi with respect to the line {(x, G M^; ?/ = x}, we get 

Px^xX2 = \\P^XixX2 - Prx2y.xA\ (1) 

j^T^max _ HDp (Supp X X2, SUpp X2 X 

p((l,l),(M,M)) 
;^n— - ^^ixxa [distp (Xi X X2,suppX2 X Xi)] 

p((l,l),(M,M)) • 



3.3 The exchangeability plot 

To illustrate the degree of weak exchangeability of a pair of discrete random variables 
visually we propose the exchangeability plot. The exchangeability plot for the random 
variables Xi and X2 is obtained by depicting both supp Xi x X2 and supp X2 x Xi 
in the same figure. A pair of random variables (Xi,X2) is weakly exchangeable iff 
the two sets overlap completely. Figure [T] shows two exchangeability plots. The pair 
of variables in the left panel is weakly exchangeable, while the pair of variables in 
the right panel is not. Given samples of Xi x X2 and X2 x Xi we can also define 
a sample exchangeability plot, depicting the observed supports. Further details are 
provided in Appendix A. 



3.4 The exchangeability of genes 

In this section we use the terminology developed in the previous sections to define 
the exchangeability of a set of genes with respect to a specific experiment. We will 
also define another measure of exchangeability which is specifically adapted to the 
study of ranked gene lists. 

We assume that we are given a universal set of M genes, denoted Q = {gi, . . . ,gM}- 
We use the word experiment to denote a pair consisting of a population (e.g. cancer 
patients and healthy control subjects) and a variable ranking method (e.g. a t-test 
contrasting the two groups in the population). The sample space Q consists of all 
possible rankings of the M genes, and the random variables Xi,...,Xm : ^ — 
{1, . . . , M} represent the ranking positions of the genes in Q. A finite set of genes 
{gi-^, . . . ,gi^} is said to be (weakly) exchangeable iff the corresponding sequence of 
random variables (Xj^, . . . ,Xj,^) is (weakly) exchangeable. Intuitively, a set of genes 
is exchangeable iff their positions in the variable ranking can be interchanged without 
changing the biological interpretation of the ranking. 

Of course, we do not know the measures Prx^ for the variables in practice, so 
these have to be estimated. In this paper we use subsampling to generate a collection 
of B data sets for which we compute gene rankings. From the B gene rankings, we 
construct a position vector Si for each gene gi by collecting all positions of the gene in 
the B rankings. The elements of a position vector Si are then samples of the random 
variable Xj. Combining two position vectors Si and Sj gives samples of the variables 
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Xi X Xj and Xj x X^ which can be used to obtain estimates of -Pxfxx ' -^-^xfxx 
^■^Xixx - This estimation is discussed further in Appendix A. 

We now introduce another measure of exchangeabihty which is especially adapted 
to the study of ranked gene lists. The rationale behind this measure is that we may 
not want two genes to obtain a small exchangeability distance if they always appear 
in the same order in the rankings. For example, a gene which is always ranked first is 
not highly exchangeable with a gene that is always ranked second, since the first gene 
is clearly more strongly related to the response than the second. The new measure 
penalizes such situations in the computation of the exchangeability distance. For a 
pair of random variables {Xi,X2) we define a new set- valued random variable on Q 

by 

R {Xi X X2) {to) := {{x,y) eMx M; sign {x - y) = sign (Xi (c^') - X2 {lo))}. 

Definition 4. The one-sided mean exchangeability distance for a pair of discrete 
random variables {Xi,X2) is defined by 

_ Ex.xx. [dist, (Xi X X2, suppX^ X X^r^R{Xl x X2))] 
^""""^^x^^- p((l,2),(M-l,M)) 

if suppXa X Xi n R{Xi x X2){uj) ^ for all a; e Q with Prx.xx^i^i x X2{uj)) > 0, 
and oEDxl'^x2 — ^ otherwise. 

Details on the estimation of the one-sided mean exchangeability distance are pro- 
vided in Appendix A. It is also possible to define a one-sided variant of the maximal 
exchangeability distance {oEDx°'xx ) analogous manner. 

We note that due to the normahzation factors introduced in the estimates of weak 
exchangeability, all measures introduced above attain only values in [0, 1] . This allows 
us to define similarity measures {exchangeability scores) for pairs of genes as follows: 

JD cVar 1 JDVar Qmean i ZT" T^mean 77' Qmax i jp r)max 

^'-'XiXXj — ^ ^XiXXp ^'-^XiXXj — ^ ^-I^XiXXp ^'JXiXXj — ^ ^^XiXXp 

oES^^Tx, = 1 - oED^Tx, , oES^--x, = 1 - oED^-^x, ■ 

Finally, we define normalized values of the exchangeability scores by comparing them 
to the corresponding values for pairs of random variables with some pre-specified 
distribution representing a null hypothesis of no association. In this paper, the main 
focus is on weak exchangeability of pairs of discrete random variables, in which case 
it is natural to compare to a random variable Yi x Y2 uniformly distributed on a 
set S C M X M with cardinality equal to that of suppXi x X2. We show only the 
normalization for oESx^^Xj ^ ^^e other scores can be normalized analogously. 

Definition 5. The normalized one-sided mean exchangeability score for a pair of 
discrete random variables (Xi,X2) is defined by 

IT Qmean ^ rp Qmean \ 

0^^X,xX2 -O-^^Y^xY; \ /.X 

1 - oES-^^^^ )^ 

where Yi x 1^2 is a random variable uniformly distributed on a set jS" C M x M with 
IS"! = |suppXi X X2I, and (a)+ = max(a, 0). 
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We note that the measures of exchangeabihty depend on the number of genes in 
the ranking (M). For two genes having the exchangeabihty plot shown in the left 
panel of Figure flj we get noES^^'^x2 ~ ^-^ irrespective of the number of genes since 
in this case, the distance between any value of Xi x X2 and suppX2 x Xi is zero. For 
the exchangeability plot in the right panel of Figure [l] we obtain noES^'^'^x2 ~ ^-^"^ 
if M = 15 and noESY^Tx^ = 0-99 if M = 1, 000. In Appendix H we compute 
exchangeability matrices for some synthetic example data sets. 

4 A general framework for list representation and compari- 
son 

In this section, we present a general framework for list representation and comparison. 
The lists are represented as vectors in M*^, where the entry in position i gives the 
contribution of gene Qi. The vector representation allows us to compare both ranked 
and unranked gene lists within the same framework, using one of the many similarity 
or dissimilarity measures available to compare vectors in M*^. This is an advantage 
compared to existing methods for list comparison, which are specifically designed to 
compare certain types of lists. Our framework also provides a way to determine which 
genes are most important for explaining the similarity between two lists. Assume for 
example that some measure based on the scalar product in M*^ is used to measure 
the similarity between two vectors x and y. Then the value of is a measure of the 
influence of the i'th variable on the similarity between the two lists (see Appendix 
G). Finally, ordering the genes by their weights in the vector gives a new ranking of 
the genes. 

As above, we have a universal set of M genes, Q = {gi, . . . ,gM}, where the genes 
are indexed in a fixed (but otherwise arbitrary) fashion. The universal set can be e.g. 
all genes on a microarray chip. An ordered (unordered) gene list is then an ordered 
(unordered) subset of the universal set. By defining a function 

/ : {positions, exchangeabilities, reliability, ...) 1— ?■ G 

we use information about various characteristics of the given list to create a vector 
representation. 

4.1 General idea 

Let £ C ^ denote a list. If i is ordered and if gene g^ is contained in i, we denote 
its position by ire^i). If gi ^ i, we define ni{i) = 0. For an unordered list, we let 
7ri{i) := Xi{9i)i where is the characteristic function of the set ^. Given a list ^ we 
define a corresponding list matrix Gi as the product of three basic M x M matrices; 

Ge := AeVeWe. 

The three basic matrices are designed to represent different characteristics of i. We 
call Ai the position matrix, Vg the exchangeability matrix and Wi the global weight 
matrix. From the list matrix we form a list vector li := {{le)i, ■ ■ ■ , {le)M), by letting 
{le)i := h{{Gg)i) where {Gi)i denotes the z'th column of Gi and h : — )■ M is 



7 



a summarization function, e.g. a norm. The hst vector will be used as the vector 
representation of the list. Once all lists of interest are represented by vectors in M^, 
we can define the similarity between them e.g. as the cosine of the angle between the 
corresponding list vectors, i.e. 

where • denotes the inner product in R^, and we can obtain a dissimilarity coefficient 
as 

rf(£i,£2) = 1-5(^1,^2). (5) 

Choosing A^, V^, W^^ h and the (dis) similarity coefficient on M*^ suitably, most 
methods currently available for list comparison fit into this general framework. In 
Appendix I we show how this can be done for a collection of well-known methods. 

4.2 The position matrix 

The position matrix is defined as a diagonal matrix that contains information 
about the type of list (ordered or unordered) and the positions of the genes within 
the list. We define the diagonal element {Ai)ii (the position value of gene Qi) via a 
monotone transformation of the ranking statistic of the gene. This means that the 
diagonal elements corresponding to genes in the top of the hst I are high, while the 
genes further down in the list obtain lower values. All genes not in the list are given 
position value zero. We note that in some cases, other choices of position values 
may be better suited for unordered lists, where it may be desirable to give the genes 
different weights, e.g. depending on some external criterion, even though there is no 
specified ordering. 

4.3 The exchangeabiUty matrix 

The exchangeability matrix Vt carries information about the exchangeability between 
the genes in Q in the specific experiment giving the list ^. In most examples in this 
paper, we define the entry (V^)^^ to be the estimated normalized one-sided mean 

— ■ mean 

exchangeability score of Qi and Qj (i.e. noESx^xXj)^ the diagonal elements are 
always 1. If is diagonal, i.e. = Im, then the only non-zero elements in the 
list vector are those corresponding to genes that are actually contained in i and 
consequently only the genes that are present in the list affect its vector representation. 
However, if Ve is not diagonal, there is a possibility that the vector representation 
of the list is extended, i.e. that it contains non-zero entries for genes which are not 
themselves present in the list, but are exchangeable with some gene in the hst. The 
(absolute) weight of a gene in the list can also be increased if it is highly exchangeable 
with a gene with a higher (absolute) position value, since the high exchangeability 
indicates that the genes could as well have switched positions without changing the 
interpretation of the list. 

We note that the general framework for list representation supports any matrix 
of gene similarities in the place of V^. For example, Vi could be defined from some 
kind of expert biological knowledge, e.g. concerning which genes are related to the 
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same biological function. This could be used for example when comparing lists from 
different experiments to each other. Yet another option is to use the positive part of 
the correlation matrix in place of Ve, since a high correlation between the expression 
levels of two genes is often considered to indicate similar biological functions of the 
genes. 



4.4 The global weight matrix Wi 

The global weight matrix We is a diagonal matrix that permits weighting the influence 
of the genes differently, depending on some informativeness or reliability estimate. 
For example, we may wish to downweight the influence of a gene that has a high 
probability to be present in an arbitrarily chosen list, since this gene is unspecific 
and may not give much relevant information about the similarity between a pair of 
hsts. 



5 Applications 



5.1 Data sets 

To illustrate the proposed methods, we use three microarray data sets, which were 
downloaded from http://www.broadinstitute.org/gsea/datasets.jsp. These data sets 
have already been pre-processed by replacing the original probe set IDs with gene 
symbols and summarizing all probe sets mapping to the same gene by the largest value 



for each sample. The two lung cancer data sets were re-analyzed by Subramanian 



et al. (2005). 



• Boston lung cancer data (Bhattacharjee et al.. 2001). This data set contains 
expression measurements of 5,217 genes in 62 lung cancer patients, classified 
according to outcome (good or poor) with 31 observations in each group. 



Michigan lung cancer data (Beer et al. 2002). This data set contains expres- 



sion measurements of the same 5,217 genes as in the Boston lung cancer data, 
for 86 lung cancer patients (24 with poor outcome and 62 with good outcome). 



Diabetes data (Mootha et al. . 2003). This data set contains expression mea- 



surements of 15,056 genes in 17 diabetic patients and 17 control subjects. 



5.2 Stabilization of ranked gene lists 

The main objective for introducing exchangeabilities into the list representation is to 
increase the robustness of the resulting gene list. In this section we evaluate different 
aspects of the stability of the extended gene lists by comparing with non-extended 
lists, lists extended using correlations and lists generated by aggregation. 



5.2.1 Stability of top-A; gene lists 

In many cases, only the top-ranked genes from an experiment are studied further, 
which stresses the importance of obtaining a robust and informative set of top-ranked 
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genes. To study the usefulness of exchangeability stabilization for this purpose we 
apply the following steps to the Boston lung cancer data. 

1. We generate 10 modified data sets by bootstrapping samples from the original 
data set, taking the class labels into account. 

2. For each modified data set, we rank the genes using five different methods: 

(a) Ranking the genes according to their signal-to-noise ratio (SNR) when com- 
paring the two patient groups. Genes positively associated with good out- 
come are placed in the top. Here, 

c AT n( ■\ ^tigood) - niiipoor) 

SNR[i) = — 

ai[gooa) + ai[poor) 

where mi{poor) and ai{poor) denote the mean value and standard deviation 
of gene i in the patients with poor outcome, and mi{good) and ai{good) are 
the corresponding values in the patients with good outcome. Ranking the 
genes by their SNR values gives the non-extended lists. In general, lists 
generated in this or similar ways are those that are used for interpretation 
and biological conclusions. 

(b) Computing the extended list vector as described in Section |4] and ranking 
the variables according to their contribution to the list vector. The position 
matrix is derived from the SNR-based ranking of the genes, by letting 

if SNR{i) > 



with 6^ = 350 and M = 5, 217. We comment on the selection of this func- 
tion and the parameter values in Appendix B. The position vectors are 
computed by subsampling the original data set B = 20 times (each time 
keeping 2/3 of the samples from each group) and ranking the variables by 
their SNR values. From the position vectors we then compute the normal- 

— — — mean 

ized one-sided mean exchangeability scores noES x^xXj ^ gene pairs 
to create the exchangeability matrix. We take the global weight matrix 
Wi = Im- To create the list vector from the list matrix, we define the i'th 
entry of the list vector as the element with the largest magnitude in the i'th 
column of the list matrix Gi. This means that a gene which is strongly ex- 
changeable with a gene with a highly negative position value can be moved 
downwards in the list, so the two extreme ends are treated symmetrically. 
In the final ranking, genes with a highly positive contribution are placed 
in the top and genes with a highly negative contribution are placed in the 
bottom. This gives the extended lists. 

Subsampling the modified data set 100 times, and each time deducing a 
ranking from the SNR values as above. The final ranking is then obtained 
as an aggregate ranking, by computing the median position of each gene in 
the 100 subsample rankings. The gene with the lowest median position is 
placed in the top. This procedure gives the median aggregated lists. We 
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also create aggregated lists by computing the product of the ranks of each 
gene in the 100 subsample rankings and ordering the genes by increasing 
value of the rank product. This gives the rank product aggregated 
lists. 



(d) Computing the extended list vector as described in step 2b, but using 
the positive part of the correlation matrix of the original data set as the 
exchangeability matrix. The genes are ordered by decreasing contribution 
to the resulting list vector. This gives the correlation extended lists. 

3. The correspondence between the lists from different bootstrap data sets are 
visualized through concordance plots. For each of the five ranking methods 
described in step |2} let fk denote the number of genes that are among the 
top-A; in the resulting lists from all bootstrapped data sets. The concordance 
plot depicts fk as a function of /c for /c G {1, . . . , 5217}. If the lists are highly 
reproducible with respect to sampling variation in the underlying data, we get 
fk ^ k for all k. We also construct concordance plots for the reversed lists, 
i.e. letting be the number of genes that are among the bottom-fc in all lists. 
As another measure of the stability of the gene rankings, we compute the mean 
overlap between the top-30 and bottom-30 genes from each pair of bootstrapped 
data sets for each of the five ranking methods defined in step [2j The resulting 
figures are given in Appendix E. 

The top row in Figure |2] shows concordance plots for the gene lists obtained 
by the five ranking methods. It is clear that the exchangeability-extended lists are 
more stable than the lists obtained by the other methods with respect to sampling 
variations in the underlying data set. Notably, the correlation-extended lists are 
less stable than the exchangeability-extended lists, indicating that the correlations in 
this case do not capture the relevant characteristics of the data. The bottom row in 
Figure |2] shows corresponding concordance plots for gene lists extracted from a data 
set where the sample labels have been randomly permuted. These figures show that 
the stability of the extended lists that was noted in the top row is clearly dependent 
on that the gene lists actually share some information. Hence, the stability is not 
due to spurious features unrelated to the discrimination between patients with good 
and poor outcome. For this data set, the correlation between the expression values 
for a pair of genes has little to do with the estimated exchangeability of the genes 
(see Appendix D). 



5.2.2 Stability of distances between ranked lists 

Next, we study the stability of the distance between ranked lists from the three 
different data sets. First, the data sets are adjusted to contain the same genes, 
which leaves 5,149 genes. We then compute the exchangeability matrix for each data 
set (normalized one-sided mean exchangeability scores, position vectors obtained by 
subsampling B = 20 times). For each data set, we construct 10 modified data sets 
by bootstrapping, taking the class labels into account, and compute extended and 
non-extended list vectors for each bootstrapped data set. The pairwise distances 
between all extended (or non-extended, respectively) list vectors are computed using 
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(|5]) and we study the variation of the distances from each comparison. For hst vectors 
from different data sets, the aim is to obtain a robust value of the distance. For hst 
vectors from the same data set the distance should also be close to zero. 

Figure |3] shows histograms of the computed distances for each comparison. It is 
clear that the lists from the same data set are much more similar after extension 
than before (the distances are much closer to zero). Comparing the two lung cancer 
data sets, the extended list vectors are more similar and the distance estimates are 
more stable than without extension. This suggests that the extended list vectors 
incorporate information which is shared between the two data sets and that is missed 
if we only study the top genes. For the comparisons between the lung cancer data 
sets and the diabetes data set, the non-extended list vectors are almost orthogonal 
(implying a dissimilarity around 1), which indicates that the top genes from the data 
sets are completely different. The dissimilarities are close to 1 also after extension, 
which suggests that there is much less shared functional information between a lung 
cancer data set and the diabetes data set than between the two lung cancer data sets. 
In Appendix F we give the corresponding histograms for rankings obtained from data 
sets were the class labels are permuted independently in each bootstrap round. 



5.3 Informativeness of ranked gene lists 

Although stability of gene rankings is an important and desirable property, it is 
not the only thing that is of interest. For example, if we define a ranking method 
which always assigns a gene the same pre-defined position, the ranking would be 
extremely stable but most likely useless. We therefore study the informativeness of 
the rankings obtained as described above by examining the ability of the top-ranked 
genes in each list to discriminate between the two patient groups in the Boston 
lung cancer data. We use ten-fold cross-validation to assess the performance of the 
classifiers. For each training/ test set split we compute the five rankings as described in 



Section 5.2.1 for the training set, and extract the top- and bottom-/c genes from each 
ranking. The expression levels for these genes are centered and standardized based 
on their mean value and standard deviation in the training set. The standardized 
expression levels of the selected genes are then used as features in a centroid classifier 



(Scholkopf and Smola, 2002) which is used to classify the remaining (test) samples. 
The reported classification accuracy is the mean area under the receiver operating 
characteristic curve (AUG) across the 10 training/test set splits. Table [T] shows the 
estimated classification accuracy for the top and bottom genes from the five rankings 
as well as the mean classification accuracy for top and bottom genes from 20 random 
rankings. Note that the top-ranked gene is always the same for the extended and non- 
extended rankings. The classification ability of the top-ranked genes in the extended 
list vectors is considerably higher than for the other methods, indicating that the 
increased stability observed in the previous section does not come at the expense of 
decreased biological significance. 
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Table 1: Classification accuracy (mean AUC across 10 train- 
ing/test set splits) for the union of the top- and bottom-k lists 
obtained from different rankings of the genes in the Boston Lung 
Cancer data with respect to their association with the discrimi- 
nation between patients with good and poor outcome. The best 
performing method for each k is highlighted in bold. 

k = l k = W k = 30 A; = 100 



Extended 


0.344 


0.774 


0.837 


0.832 


Non-extended 


0.344 


0.583 


0.606 


0.566 


Median aggregated 


0.410 


0.565 


0.617 


0.566 


Rank product 


0.400 


0.538 


0.606 


0.566 


Correlation extended 


0.344 


0.594 


0.572 


0.594 


Random 


0.464 


0.489 


0.473 


0.478 



6 Discussion 

Univariate analysis of multivariate genetic data sets usually results in a ranking of 
the variables according to some criterion. This ranking is then interpreted to gain bi- 
ological knowledge and understanding. However, it has been noted that the variable 
rankings are often highly unstable with respect to small changes in the underlying 
data set or the method used to obtain the ranking and therefore, methods for sta- 
bilizing the variable ranking and allowing more robust comparison to other lists are 
much needed. In this paper we have presented a general framework for robust repre- 
sentation and comparison of variable lists. The framework encompasses both ordered 
and unordered lists, which can therefore be compared on similar terms. Having a 
robust measure of similarity between any pair of variable lists can furthermore en- 
able visualization through e.g. multidimensional scaling to obtain a low-dimensional 
visual representation of large collections of lists. We have shown that the extended 
variable lists are more stable than the original variable lists from an experiment, and 
also more stable than lists obtained by aggregation of several lists from subsampled 
data sets. These results suggest that the exchangeability concept for random vari- 
ables may be a suitable tool for quantifying the functional redundancy among genes 
and incorporating this information into the list representation. 

The results from the proposed method can be used in different ways. Given the 
vector representation of the gene lists there are many natural choices of similarity 
and dissimilarity measures that can be applied to compare lists. By ranking the 
genes according to their contribution to the list vector we also obtain a new gene 
ranking which may be used to obtain more robust results with other methods, such 
as e.g. Gene Set Enrichment Analysis (GSEA) ( [Subramanian et 'oT. 2005) to study 



the enrichment of gene sets among the genes most highly related to a response. 
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Figure 1: Exchangeability plots for two pairs of random vari- 
ables. The pair in the left panel is weakly exchangeable, while the 
pair in the right panel is not. 
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Figure 2: Concordance plots for gene lists obtained by the five 
methods described in Section \5^ The top row shows concordance 
plots for the observed data (left paneh top genes, right panel: 
bottom genes) and the bottom row shows corresponding plots 
for data where the class labels have been randomly permuted, 
so that the gene rankings from different bootstrapped data set 
are unrelated. The curves corresponding to the original ranking 
and the two aggregated rankings coincide almost completely in all 
cases, indicating that the rankings obtained by aggregation are not 
more stable than the original rankings with respect to sampling 
variations in the underlying data set. The extended lists provide 
a more stable gene ranking for the observed data. Interestingly, 
using the positive part of the correlation matrix to extend the 
gene lists gives less stable rankings. 



19 



Bost-Bost, extended 



Bost-Mich, extended 



Bost-Diab, extended 







Frequency 


o 

LO 

O 
tN 

O - 


f 




1 

0.0 


1 1 1 1 1 

0.2 0.4 0.6 0.8 1.0 






1 1 1 1 1 

0.6 0.8 1.0 1.2 1.4 




Dissimilarity 








Dissimiiarity 


Mich-Mich, extended 






Mich- 


-Diab, extended 



I 1 1 1 1 1 

0.0 0.2 0.4 0.6 0.8 1.0 
Dissimiiarity 

Bost-Bost, nonextended 




I 1 1 1 1 1 

0.0 0.2 0.4 0.6 0.8 1.0 
Dissimiiarity 

Mich-Mich, nonextended 




I 1 1 1 1 1 

0.0 0.2 0.4 0.6 0.8 1.0 
Dissimiiarity 



<D o 



I 1 1 r 



0.6 0.8 1.0 1.2 1.4 
Dissimiiarity 

Bost-Mich, nonextended 



<D o 



I 1 1 r 



0.6 0.8 1.0 1.2 1.4 
Dissimiiarity 

Mich-Diab, nonextended 



A 



I — I — I — I — I 

0.6 0.8 1.0 1.2 1.4 
Dissimiiarity 



I r 



JUL 
— T 



"1 1 



0.6 0.8 1.0 1.2 1.4 
Dissimiiarity 

Diab-Diab, extended 



3 o 



I 1 1 1 1 1 

0.0 0.2 0.4 0.6 0.8 1.0 

Dissimiiarity 
Bost-Diab, nonextended 



<u o 



ilL_ 



I 1 1 r 



0.6 0.8 1.0 1.2 1.4 
Dissimiiarity 

Diab-Diab, nonextended 




I 1 1 1 1 1 

0.0 0.2 0.4 0.6 0.8 1.0 



Dissimiiarity 



Figure 3: Histograms of tlie distances computed between ranlced 
lists from the same data set or from different data sets witli and 
without stabihzation through exchangeability extension. Bost - 
Boston lung cancer data, Mich - Michigan lung cancer data, Diab 
- Diabetes data. 
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A Estimating exchangeability distances 



In the main article, we indicate how to estimate the exchangeabihty distances between 
genes based on subsampled data. This section provides a more thorough description. 
Recall that for any gene gi in the universal set Q, we define a random variable 
representing the position of the gene in the gene ranking from an experiment. We 
use subsampling to generate B slightly modified data sets from which we extract 
gene rankings. The observed ranking positions for g^, i.e. the samples of the random 
variable Xj, are collected in a position vector Si. Note that the positions from the 
B rankings must be placed in the same order in all position vectors. Combining two 
position vectors Si and Sj provides samples of Xi x Xj and Xj x Xi, which are used 
to estimate the exchangeability distance between gi and gj. 

First, the sample exchangeability plot for the two genes is obtained by depicting in 

the same plot = {{{Si)k: {Sj)k)}k=i ^^id ^jxi = {{{Sj)k: {Si)k)}k=r If ^^ese point 
sets overlap completely, the two genes are considered to be weakly exchangeable. The 
estimates of ED^"-^^. and ED'^':'^^. are obtained by 



max 

EDv.^f 



mean 

ED^ 



(M,M)) 

Eti h distp ({((g.)fc, {S,)k)} , {{{S,U {S,),)}ti) 
p((l,l), (M,M)) 



We note that since -E^xfxx only considers the largest distance between the two 
sets it is sensitive to outliers which may have detrimental effects when the number 
of samples used to estimate it (B) is small. To estimate the total exchangeability 
variation we first compute Gaussian kernel density estimates of the probability density 
functions of Xi x Xj and Xj x Xi from the samples Sixj and Sjxi- We then compute 
the difference between the two kernel density estimates and integrate the positive 
part (or, equivalently, the negative part) of the difference over M^. The value of the 
integral is taken as the estimate Px^^Xj ■ 

Remark 1. Note that it is also possible to let Si be a binary vector such that {Si)k = 1 
if gi is present in some gene set output from the experiment (e.g. differentially 
expressed genes) and {Si)k = otherwise. 

The one-sided exchangeability distance introduced in the main article penalizes 
gene pairs where one of the genes is always ranked before the other by computing 
distances only between points on the same side of the line {{x,y) G R^; y — x}. 
Given two position vectors Si and Sj of length B, we define 

Rij{k) = {(x, y) e M X M; sign {x - y) ^ sign {{Si)^ - (Sj),)} 

for k = 1, . . . , B. The one-sided mean exchangeability distance between the two genes 
is then estimated by 

„ _ Etiidist,({((-S,)fc,(5,)fc)},{((5,).,(5.),)}t,ni?^.(fc) 

oEDx.xXj - p((l,2),(M-l,M)) 
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^ — ■IILCLLIt 

if {{{Sj)y, {Si)v)}^^i ^ for ^) we define oEDx^^Xj = 1 otherwise. 

To estimate normalized exchangeability scores, e.g. 



— — mean 

noESy.^y 



— — mean - — ~ mean 

oESx^xx, -oESy^y^Yj 

~ mean 
1 - O^^y.xY^ 



J 



for the Boston lung cancer data wc note that for virtually all pairs of variables 
{Xi, Xj), the cardinality of the observed support of Xj x Xj is equal to B. Therefore, 

mean 

to compute oESy^xY ■> repeatedly sample B points uniformly from M x M to form 
suppFj X Yj, and compute the resulting exchangeability scores for Yi x Yj uniformly 

- — ~ mean 

distributed on its support. Then oESy.^y. is taken to be the mean value of the 
estimated exchangeability scores. 



B Constructing the position matrix 

In the main article we use the rank-based function 

' 1 if SNRi,) < 0, 

with 6^ = 350 and M = 5, 217 to construct the position matrix from the positions 
of the genes in the ranking. This function and parameters were chosen heuristically, 
and here we give some motivation behind the choices. First, we want to treat the 
variables which are positively and negatively associated to poor outcome symmetri- 
cally, hence the different expressions for positive and negative SNR. Second, we want 
the position values to be somewhat adapted to the exchangeability scores. If the 
position values decrease too fast, the influence of the exchangeabilities is too high 
since all variables exchangeable with the top gene will be moved close to the top of 
the list. On the other hand, if the position values decrease too slowly, the exchange- 
abilities only have a marginal impact. We found the selected function to provide a 
good trade-off between these two effects. 

We chose a rank-based function since this type of functions can be used also when 
there is no clearly defined ranking statistic. Note that another way of constructing 
the position matrix would have been to choose {Ai)ii ~ SNR{i), which in this case 
gives very similar results (data not shown) . 



C Choosing the weight matrix 

The global weight matrix in the general framework presented in the main article 
can be used to incorporate any information about the quality of the measurements 
corresponding to each gene. In the examples in the paper, we do not have this 
information and hence use = Im- Here, we give some examples of how the weight 
matrix can be chosen. Given a collection of K reference lists, we can choose e.g. 
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where Ki is the number of reference hsts containing gi. This resembles the inverse 
document frequency weighting commonly used in the field of information retrieval. 
With this choice, a gene which is present in a large number of lists obtains a lower 
weight. Another weighting scheme could take into account the stability of the position 
of the gene in the list, which could be estimated through resampling techniques. A 
gene which is often in the same position would then have a large influence on the list 
vector relative to a gene which can be in many different positions. 

D Relationship between positive part of correlations and ex- 
changeabilities in the Boston lung cancer data 

As discussed in the main article, using the positive part of the correlation matrix to 
extend the list vector did not give more stable rankings. Figure |4] shows the relation- 
ship between the positive part of the correlation between the expression values and 
the estimated exchangeabilities for 10,000 randomly chosen gene pairs in the Boston 
lung cancer data. Apparently, in this data set the correlation between expression 
values has little to do with the estimated exchangeability between the genes. 




0.0 0.2 0.4 0.6 0.8 

Positive part of correlation 



Figure 4: Positive part of correlation coefEcient between expres- 
sion values and the estimated exchangeabilities for 10,000 ran- 
domly chosen gene pairs from the Boston lung cancer data. 



23 



E Overlap plots 



The concordance plots shown in the main article give a visualization of the overlap 
among the rankings from all the 10 bootstrapped data sets. Here, we show overlap 
plots, giving the mean overlap between the top- and bottom-30 genes from each pair 
of bootstrapped data sets, for each of the ranking metrics. This emphasizes other 
aspects of the stability since we only consider pairwise overlaps. The top rows in 
Figure |5] show the mean overlap between the top-30 and bottom-30 genes for each 
pair of bootstrapped data sets. It is clear that the top parts of the extended hsts have 
a larger overlap than the non-extended or aggregated lists, further indicating that 
the exchangeability captures relevant biological information. The bottom rows in 
Figure |5] show corresponding plots for random rankings (obtained by independently 
permuting the sample labels in each bootstrapped data set). These figures show that 
in the absence of a functional connection between two gene lists, no lists have the 
same top-ranked genes. 

F Distance between ranked lists from permuted data 

In the main article, we studied the stability of distances between ranked lists from 
different experiments (Figure 3 in the main article). Figure [6] shows corresponding 
plots for random rankings, i.e. for rankings obtained from data sets were the class 
labels are permuted independently in each bootstrap round. The distance between 
the extended list vectors are not more stable than the distance between the non- 
extended list vectors and as expected, the list vectors from the same data set are 
more dissimilar than for the original data (Figure 3 in the main article), since there 
is no shared information between the different rankings. 

G Interpretation of similarity score 

In this section we show how to find the genes which are the most important for 
explaining the similarity between two gene lists. As an example, we take the lists 
obtained from the Boston lung cancer data and the Michigan lung cancer data. We 

— mean 

compute extended lists as described in the main article, using noES x-xXj a mea- 
sure of exchangeability in each data set. Then we compute the dot product between 
the resulting extended list vectors and find the genes with the highest contribution. 
Table |2] shows the genes with the highest contribution, their position in the original 
(non-extended) list and their position in the extended lists from each data set. 

H Synthetic examples 

In this section we give some examples comparing the different measures of exchange- 
ability to each other and to the correlation coefficient, which is often used to quantify 
the relationship between two genes. We expect the exchangeability to highlight other 
relationships than the correlation coefficient since the latter does not take into ac- 
count the specific experiment. 
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Table 2: Genes with highest inRuence on the similarity between 
the extended list vectors from Boston lung data and Michigan 
lung data. 



Position in Position in Position in Position in 

Gene Contribution ext. list Boston ext. list Michigan nonext. list Boston nonext. list Michigan 

CASP4 0.922 5,207 5,208 5,198 5,191 

DBP 0.910 1 19 1 40 

EN02 0.837 5,217 5,165 5,217 5,186 

FADD 0.836 5,162 5,215 5,141 5,215 

KRT18 0.826 5,196 5,183 5,205 5,206 

CR2 0.826 29 8 13 17 

TUBAl 0.819 5,194 5,184 5,199 5,185 

LAMB3 0.798 5,168 5,195 5,168 5,164 

KRT19 0.794 5,160 5,206 5,140 5,204 

BZWl 0.780 5,201 5,158 5,209 5,181 



Example 1. We simulate a synthetic data matrix X G M by letting 

f Af{l, 1) if 1 < i < 10, 1 < J < 20 
^ \ AA(0, 1) otherwise. 

As the variable ranking method we use a univariate t-test contrasting the first 20 
samples against the last 20 samples. Clearly, with respect to this experiment the 
first 10 variables should be highly exchangeable since on a population level, they are 
all equally related to the contrast between the two sample groups. Similarly, there 
should also be a certain degree of exchangeability between the last 40 variables, since 
none of them is related to the response. We generate position vectors for the genes by 
subsampling the original data set B = 50 times, each time keeping 2/3 of the samples 

— - — Var ^ —mean — ——-max . — — — .mean 

from each group. We then compute nPSx xx^ ^^'^x xx^ ^^^x xx^ ^^-^^ x xx^ 

. — — — max 

noESx^xXj positive part of the correlation coefficient between each pair 

of variables. Figure [7] shows the exchangeability matrices and the positive part of 
the correlation matrix, averaged over 10 realizations. All exchangeability measures 
clearly divide the genes into two groups consisting of the first 10 and the last 40 
genes, with only very few nonzero exchangeabilities between genes from different 
groups. Moreover, the first 10 genes are more highly exchangeable than the last 40. 
The correlations do not detect this structure. 

Example 2. We simulate a synthetic data matrix X G such that 

( U{2, 1) if 1 < i < 8, 1 < j < 15 
G < A/'(2, 1) if 9 < i < 16, 16 < J < 30 
y A/'(0, 1) otherwise. 

We assign the first 30 samples to one group and the next 30 samples to another group. 
Hence, the first group of 8 variables and the next group of 8 variables are both related 
to the contrast between the two groups, but the two groups of variables are mutually 
exclusively overexpressed in each sample. This could correspond to a situation where 
these two groups of variables have the same function in the cell and therefore do not 
all have to be overexpressed in a particular sample. The structure of the data matrix 
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is shown in Figure 8(a) , We rank the variables by comparing the two sample groups 



with a univariate t-test. We estimate the exchangeability scores noES 



X^xXj 



between 



the variables from position vectors obtained by subsampling the data set B = 50 
times, each time keeping 2/3 of the samples from each group. Figure 8(b) shows the 
exchangeability matrix and Figure 8(c) shows the positive part of the correlations, 
all averaged over 10 realizations. The exchangeability score detects the equivalence 
of the two groups of 8 variables with respect to the ranking method. The correlation 
does not take the response into account and hence does not find the relationship 
between the two groups of variables. Figure 8(d) shows the exchangeability scores of 
the variable pairs plotted against the corresponding correlations. 



I Methods for list comparison 



In this section, we provide a brief overview of previously proposed methods for list 
comparison. These comprise both methods for comparing two unordered lists, meth- 
ods for comparing two ordered lists and methods for comparing one ordered and one 
unordered list. Reviews can also be found in e.g. Goeman and Biihlmann (2007); 



Song and Black (2008); Ackermann and Strimmer 



(2009); Boulesteix and Slawski 



(2009); Huang et al. (2009). We will also show how a number of the most well-known 



methods can be formulated within the framework presented in the main article by 
tuning the selection of Ai, Ve, Wi, h and the similarity or dissimilarity measure on 



Comparison of gene lists is an essential part of many applications. One such is 
gene set enrichment analysis, where a (usually short) unordered list (a gene set) is 
checked for significant enrichment among genes which are highly related to a response. 
The simplest methods, commonly denoted overrepresentation analysis methods, are 
based on computing the overlap between the gene set and an unordered set of e.g. 
differentially expressed genes. The size of the overlap is then checked for significance 
by comparison to the hypergeometric distribution or its approximation by the bi- 
nomial or distributions (Draghici et a/. 2003 Hosack et al, 2003; Khatri and 



Draghici 2005). The simplicity of these methods have made them the methods of 



choice in many software packages and web tools for gene set analysis. In the same 
spirit, methods based on Venn diagrams have been proposed (Smid et a/., 2003), as 



well as the POG (percentage of overlapping genes) score (Ein-Dor et al. 2006; MAQC 



Consortium 2006[ ). The POG score has recently been extended to take into account 



between the genes (Gong et a/., 2010) 



correlated molecular changes ( Zhang et al. , 2009 ) or known functional relationships 



Another approach to gene set enrichment analysis is to first compute a ranking 
of all the genes from an experiment and then check the genes in the gene set for 
significant enrichment in the top and/or bottom of the ranking. The most well-known 



method is Gene Set Enrichment Analysis (GSEA) (Mootha et al, 2003 Subramanian 



et al, 2005) where a modified Kolmogorov-Smirnov statistic is used to quantify the 



enrichment of a gene set. Other methods for combining individual gene statistics into 
a summary statistic for a gene set, e.g. by simple averaging, have also been pursued 

for a discussion and further references). 



[see e.g. 



Ackermann and Strimmer, 2009 



To e.g. assess the stability of gene rankings from different studies, several authors 
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have proposed methods for comparing two ordered gene hsts. The overlap score 



proposed by Yang et al. (2006) is one such method, which computes a weighted sum 
of the overlap of the top-fc and/or bottom-fc lists for k = 1, . . . , M where M is the 
number of genes in the ranking. By adjusting the weights, genes in the extreme ends of 
the lists can have a higher influence on the stability score than the genes in the middle. 
Viewing the rankings as permutations of {1,...,M} (possibly truncated) several 
methods have been proposed for comparing the corresponding permutations using e.g. 
Spearman's footrule, Kendall distance or the Canberra metric (or a modified variant 



for truncated hsts) (Fagin et al. 2003 Jurman et al. 2008, 2010). Jurman et al. 



(2008) also account somewhat for the relationship between the genes by including 



the possibility to define functionally related gene modules, such that the ranking of 
the variables within such a module does not matter. Other methods for comparing 



top-fc lists taking the ranking into account have been proposed by Pearson (2007) 



and Stiglic and Kokol (2010). 



In the rest of this section, we will show how it is possible to formulate many of 
these methods for gene list comparison in our proposed framework by choosing the 
basic matrices A^, Vg and Wi as well as the summary function h and the similarity 
or dissimilarity measure on x R-^ suitably. It can be noted that most methods 
use Vc = Im, meaning that associations between the genes in the universal set are 
not explicitly taken into account. 

First, we note that if we compare two unordered lists £i and £2 and choose Ve = 
Wi = Im and {A£)ii = Xeidt) for each of the lists, the similarity score defined by the 
cosine of the angle between the list vectors is equal to 



\iini2\ 



This similarity coefficient is the geometric mean of the quantities ^^rPf^ and i^i^^ 



and has been discussed e.g. by Warrens (2008). 



K2I 



I.l Percentage of overlapping genes-related (POGR) 



Zhang et al. (2009) described a novel metric for quantifying the overlap of two un- 



ordered gene sets. Let k denote the number of genes that are shared between the 
hsts ii and £2- Then let 0^12 be the number of genes in ii that are not shared but 
significantly positively correlated with at least one gene in £2- The POGR score is 
then defined by 

P0GR,2 = 

and similarly for P0GR2i. We can compute POGR12 in our framework by choosing 
V^i = Wi^ = Im and 

{Ae,)ii = l{g, G £1} 



for the first list and W, 



'M, 



Hg^ e £2} 

l{gi significantly correlated with gj} 
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for the second hst. We take h{x) = ||a;||oo, and then 

POGRu = 

iK^illi 

Hence, to compute POGR12 we only extend the list vector corresponding to £2- 
Note however that the exchangeability matrix in this case takes both data sets into 
account, since two genes are significantly correlated only if their expression levels are 
significantly correlated in both data sets. The reverse score POGR21 is calculated by 
interchanging the roles of ii and £2- 

1.2 Hypergeometric test (Fisher exact test) 

A hypergeometric test, comparing the overlap between two unordered lists £ and i' 
to what would be expected if they were drawn randomly from the ground set, can be 
performed by letting We = Vg = We = V£> = Im and taking 

{Ae)ii = l{gie i} 

(and similarly for i'). We take h{x) = \\x\\oo and define 

s{i,i') = k-k>. 

This gives the size of the overlap between the lists, which is compared to a hyperge- 
ometric distribution with parameters M, to obtain a p- value. 

1.3 Gene set enrichment analysis (GSEA) 



GSEA (Mootha et ai, 2003 Subramanian et ai, 2005) was developed to estimate 



the enrichment of the genes within a gene set in a ranking of all variables from an 
experiment. Hence, the first list {£) is an ordered list containing all genes in Q, and 
the second list (or gene set) {£') is an unordered list with K genes. For i, we define 



{Ae)ii = \ri 



where r is the correlation or ranking metric used to order the genes from the experi- 
ment and q is the exponent controlling the weights. We take Wi = Ve = Im- 
For £', we choose 

Note that strictly speaking, this is not a function of the position in the list since all 
genes in the unordered gene set have the same position. However, as discussed in the 
main article this may be suitable for gene sets where we wish to incorporate some 
external information (in this case, information regarding the ranking statistic). We 
let 

^y^^Y _ / 1 if G f , gj e f or gi ^ f , gj 



10 otherwise. 

Finally, we choose 



(We 



1 if G 
if 9i ^ 
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Then, we compute the hst matrices Gi and G^i and create the vector representations 
of the hsts by choosing 



M 

1=1 

which means that for the ordered hst, 



{k)i = \ri 

and for the unordered gene set. 



q 



The similarity between the hsts (the enrichment score) is then defined as the maxi- 
mum deviation from zero of 

■ , M 

for m G [1, M], and the significance of the score is estimated by repeatedly permuting 
the sample labels and redoing the calculations. 

1.4 Algebraic comparison of ranked lists 



Jurman et al. (2008) proposed a method for comparing ordered top-A; lists, e.g. to 
estimate the stability of different variable selection methods. Assuming that the two 
ranked lists to be compared (£ and respectively) each contain K genes. We choose 
Wi = V£ = Wf = Vi' = Im and let 



ne{i) if Qi G 
K+1 if (7.^ 



with {Ai') chosen analogously. To create the list vector we use e.g. h{x) = mini<j<jvf 
and we define a dissimilarity measure by 

The authors also introduce so called feature modules, consisting of genes known to 
have similar function. They argue that rank differences within such a module should 
be less penalized that other differences, and therefore propose to make the distance 
independent of the ordering within the given modules. In practice, this is obtained 
by putting the elements of such a module in the same order in all lists by permuting 
the values in A^ and A^i corresponding to the genes in the module. 
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1.5 Reciprocal rank-based comparison of ranked lists 



Pearson (2007) proposed a method for comparing ordered top-fc lists based on recip- 



rocal ranks. We denote the two ordered lists to be compared i and and choose 
We = Ve = We' = Vt = Im- We choose 



{Ae)^ 



1 



if 9i e 
K+1 if 9^ ^ 



1 



and similarly for ^' . Furthermore, we take e.g. h{x) = ||a;||oo5 and choose 

d{e,e') = Wk-k'Wi- 

1.6 Similarity for ordered gene lists 



Yang et al. (2006) presented a method for comparing two rankings £ and of all 



the M genes from an experiment. We let Ve = We = Ve' = We' = Im, and take 
{Ae)ii = 7ie{i) (and similarly for £'). Then take e.g. h{x) = ||x||oo- This gives 

= (7r,(l),...,7r,(M)) 
= (7r,,(l),...,7r,,(M)). 



The preliminary similarity score is defined in (Yang et al, 2006) as 

M 



n=l 



where a is a parameter, On{(^,i') is the overlap between the top-n lists of i and i', 
and f{i) is the list obtained by reversing i. We have 

M M M 

n=l n=l k=max{{li)„,{lf,)„) 

since a gene will contribute to the overlap for all k following its highest position in 
the two lists. The second term in S'^{i,£') can be written accordingly by replacing 
{le)n and (/^')„ with M + 1 — and M + 1 — {l'e)n, respectively. Summation gives 



1 -e- 



i=l 



If it is desirable to allow one list to be the reverse of the other, one can define 
S^iiJ') = max{pS'^{i,i'), (1 - ^)S'^ii, f {£'))) 

with 13^1. 
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Figure 5: Top rows: The mean overlap between the top-30 and 
bottom-30 genes in the Usts obtained by the five ranking methods 
described in the main article, for each pair of bootstrapped Boston 
lung data sets. The overlap between the top and bottom genes 
is larger for extended lists than for lists obtained by the other 
methods. Bottom rows: Corresponding plots for lists extracted 
from data sets with permuted class labels. In this case there 
is no functional connection between the lists which is captured 
by the almost non-existent overlap between the top genes in all 
comparisons. 
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Bost-Bost, extended 



Bost-Mich, extended 



Bost-Diab, extended 
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Figure 6: Histograms of the distances computed between ranked 
lists from the same data set or from different data sets with and 
without stabilization through exchangeability extension, for ran- 
dom rankings. Bost - Boston lung cancer data, Mich - Michigan 
lung cancer data, Diab - Diabetes data. 
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Figure 7: Exchangeabilities and positive part of correlation 
for variables in Example^ averaged across 10 simulations, (a) 

— ^Var mean max mean 

nPSx.xx,- (b)nESx^xXr (c) nESx^xx,- (d) noESx^xXr (e) 

— ■ — max 

noES x^xXj ■ (0 positive part of correlations. 
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(a) 



(b) 




0.0 0.1 0.2 0.3 0.4 0.5 

Correlation 



(c) (d) 

Figure 8: (a) Structure of the data set in Example^ Each row 
represents a variable and each column represents a sample, (b) 
The exchangeability matrix, (c) The positive part of the corre- 
lation matrix, (d) The relationship between the positive part of 
the correlation coefficient and the exchangeability score for the 
variable pairs. All values are averaged across 10 realizations. 
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